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We discuss how inference can be performed when data are sampled from the non-ergodic phase of 
systems with multiple attractors. We take as model system the finite connectivity Hopfield model 
in the memory phase and suggest a cavity method approach to reconstruct the couplings when the 
data are separately sampled from few attractor states. We also show how the inference results can 
be converted into a learning protocol for neural networks in which patterns are presented through 
weak external fields. The protocol is simple and fully local, and is able to store patterns with a 
finite overlap with the input patterns without ever reaching a spin glass phase where all memories 
are lost. 
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I. INTRODUCTION 

The problem of inferring interactions couplings in complex systems arises from the huge quantity of empirical data 
that are being made available in many fields of science and from the difficulty of making systematic measurements on 
interactions. In biology, for example, empirical data on neurons populations, small molecules, proteins and genetic 
interactions, have largely outgrown the understanding of the underlying system mechanisms. In all these cases the 
inverse problem, whose aim is to infer some effective model from the empirical data with just partial a priori knowledge, 
is of course extremely relevant. 

Statistical physics, with its set of well-understood theoretical models, has the crucial role to provide complex, but 
clear-cut benchmarks, whose direct solution is known and that can therefore be used to develop and test new inference 
methods. 

In a nutshell, the equilibrium approach to inverse problems consists inferring some information about a system 
defined through an energy function H(a) starting from a set of sampled equilibrium configurations. Suppose M 
i.i.d sampled configurations S — {a;^*-*}i=i...M generated by a Boltzmann distribution of an unknown energy function 
H, P{a) = ;g^e^^'--' are given. The posterior distribution of H (also called likelihood) is given by P{H\S) cx 

exp(-M((i?) +\ogZH))P{H), where (H) = Y.fLi H {q!-'^ ) represents the average of H over the given sample 
configurations and P{H) the prior knowledge about H. The parameter M plays the role of an inverse temperature: 
when M is very large, P{H\S) peaks on the maximums (with respect to H) of the 'log-likelihood' C = — {H) — log Zh- 
The problem of identifying the maximum of C can be thought of as an optimization problem, normally very difficult 
both because the space of H is large and because log Zh is very difficult to estimate by itself on a candidate solution. 

Several methodological advances and stimulating preliminary applications in neuroscience have been put forward in 
the last few years [ll-Q . Still the field presents several major conceptual and methodological open problems related to 
both the efficiency and the accuracy of the methods. One problem which we consider here is how to perform inference 
when data are not coming from a uniform sampling over the equilibrium configurations of a system but rather they 
are taken from a subset of all the attractive states. This case arises for instance when we consider systems with 
multiple attractors and we want to reconstruct the interactions couplings from measurements coming from a subset 
of the attractors. 

In what follows, we take as model system the Hopfield model over random graphs in its memory phase (i.e. with 
multiple equilibrium states) and show how the interaction couplings can be inferred from data taken from a subset of 
memories (states). This will be done by employing the Bethe equations (normally used in the ergodic phase where they 
are asymptotically exact) by taking advantage of a certain property of their multiple fixed points in the non-ergodic 
phase. The method can be used to infer both couplings and external local fields. 

We also show how from the inference method one can derive a simple unsupervised learning protocol which is able 
to learn patterns in presence of week and highly fluctuating input signals, without ever reaching a spin glass like 
saturation regime in which all the memories are lost. The technique that we will discuss is based on the so called 
cavity method and leads to a distributive algorithmic implementation generically known as message-passing scheme. 

The paper is organized as follows. First, in Section |lT] we define the problem, and make connections with related 
works. Section IIIII is concerned with the inference problem in non-ergodic regimes, for which a simple algorithmic 
approach is presented. In Section ITVl we apply the technique to the finite connectivity Hopfield model in the memory 
phase. Section |V] shows how the approach can be turned into an unsupervised learning protocol. Conclusions and 
perspectives are given in Section IVIl 



II. THE INVERSE HOPFIELD PROBLEM 

The Hopfield model is a simple neural network model with pair-wise interactions which behaves as an attractor 
associative memory [5]. Its phase diagram is known exactly when memories are random patterns and the model is 
defined over either fully connected or sparse graphs jQ, lS] . Reconstructing interactions in the Hopfield model from 
partial data thus represents a natural benchmark problem for tools which pretend to be applied to data coming from 
multi electrode measurements from large collections of neurons. The underlying idea is that a statistically consistent 
interacting model (like the Hopfield model) inferred from the data could capture some aspects of the system which 
are not easy to grasp from the raw data Here we limit our analysis to artificial data. 

In the Hopfield model the couplings Jij are given by the covariance matrix of a set of random patterns which 
represent the memories to be stored in the system. We will use the model to generate data through sampling and we 
will aim at inferring the couplings. 

The structure of the phase space of the Hopfield model at sufficiently low temperature and for a not too large 
number of patterns is divided into clusters of configurations which are highly correlated with the patterns. We will 



3 



proceed by sampling configurations from a subset of clusters and try to infer the interactions. The simple observation 
that we want to exploit is the fact that fluctuations within single clusters are heavily influenced by the existence of 
other clusters and thus contain information about the total system. 

We consider a system of N binary neurons ai 6 { — 1,+!} (or Ising spins) interacting over a random regular graph 
of degree K; that is every node has a flxed number K of neighbors which are selected randomly. The connectivity 
pattern is defined by the elements £ {0, 1} of the adjacency matrix. The (symmetric) interactions between two 
neighboring neurons are given by the Hebb rule, i.e. Jij — Jji = -K'^^=i^i^jj where ^'^ are the patterns to be 
memorized and P is their number. At finite temperature T — 1//?, we simulate the system by a Glauber dynamics 
starting from an initial configuration, the spins are chosen in a random sequential way and fiipped with the following 
transition probability 

1 - (Ti tanh 

W{ai -a,) = . (1) 

where 

h,=9, + Y,Jtj'^j, (2) 

is the local field experienced by spin i and di is an external field. We use di to denote the set of neighbors interacting 
with i. The process satisfies detailed balance and at equilibrium the probability of steady state configurations g_ is 
given by the Gibbs measure 

= ^^e^5:.e.-.+/^E.<, J.-.-.^ (3) 

where Z[J_,ff\ is a normalization constant, or partition function. In the memory phase, the system will explore 
configurations close to a pattern, provided that the initial configuration lies in the basin of attraction of that pattern. 
In the following we use the wording patterns, basins of attraction or states equivalently. In a given state the average 
activity (or magnetization) and correlations are denoted by = {o'i)fi and = {o'iaj))^ where the averages are 
taken with the Gibbs measure inside that state. Informally, a Gibbs state corresponds to a stationary state of the 
system, and so defined by the average macroscopic quantities in that state. 

Suppose that starting from some random initial configuration we observe the system for a long time, measuring 
M configurations. The standard way to infer interactions couplings J, and external fields 9 is by maximizing the 
log-likelihood of {J_,9), given the experimental data [1, 0], namely 

■^^u,i) - E ^^"^r" + E J^^^r + ^[i^]' (4) 

i i<j 

where m^^^ — jj X]f=i '^l c^J'' — jj '^I'^j experimental magnetizations and correlations and F = 

— -g log Z is the free energy. 

One can exploit the concavity of the log-likelihood and use a gradient ascent algorithm to find the unique parameters 
maximizing the function. However, this needs an efficient algorithm to compute derivatives of the free energy -F[J, £], 
which in general is a difficult task. A well known technique which can be used for not too big systems is of course 
the Monte Carlo method (see e.g. 0). Though under certain limiting assumptions, there exist good approximation 
techniques which are efficient, namely mean field, small-correlation and large-field expansions (3l.l4. [Tol - [l3 | . 

In this paper we resort to the mean-field cavity method, or Belief Propagation (BP), to compute the log- likelihood 
(see e.g. |15l - [l8l| ) . This technique is closely related to the Thouless- Anderson-Palmer (TAP) approximation in spin 
glass literature (l9l [20j . The approximation is exact on tree graphs and asymptotically correct as long as the graph 
is locally tree-like or the correlations between variables are sufficiently weak. In spin glass jargon, the approximation 
works well in the so called replica symmetric phase. 

In the BP approach, the marginals of variables and their joint probability distribution (which is assumed to take 
a factorized form where only pair correlations are kept) are estimated by solving a set of self-consistency functional 
equations, by exchanging messages along the edge of the interaction graph (see Ref. [Tsj for comprehensive review). A 
message (typically called "cavity belief") iTi^jiai) is the probability that spin i takes state Ui ignoring the interaction 
with its neighbor j, i.e. in a cavity graph. We call this probability distribution a cavity message. Assuming a tree 
interaction graph we can write an equation for 7ri^j(o'i) relating it to other cavity messages -Kk^i^Gk) sent to i: 



, (5) 
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as in cavity graphs the neighboring variables are independent of each other. These are BP equations and can be 
used even in loopy graphs to estimate the local marginals. The equations are solved by starting from random initial 
values for the cavity messages and updating them in some random sequential order till a fixed point is reached. Upon 
convergence the cavity messages are used to obtain the local marginals or "beliefs" : 

These marginals are enough to compute the magnetizations rrii and correlations and thus can be used for maxi- 
mizing the log-likelihood by updating the parameters as 

e,^e,+r,{mTP -m,), (7) 

with T] ^ 1 and positive. Repeating this procedure for sufficient times leads to an estimate of the unknown parameters. 
Assuming that the external fields are absent, the inference error can be written as: 

/ ( Jtrue _ J, \2 

V KN ^ ' 

A more accurate estimate of the correlations can be obtained by exploiting the Fluctuation-Response theorem 
dj = drrii /dOj . This method, called Susceptibility Propagation [1^, [21| , uses derivatives of cavity messages (cav- 
ity susceptibilities). Its time complexity grows as KN'^, to be compared with the KN complexity of BP equations. 

In this paper we will work exclusively with the BP estimate which is simple and accurate enough for our studies. 
Actually, if one is interested only on correlations along the edges of a sparse graph, the BP estimation would be as 
good as the one obtained by susceptibility propagation. The reader can find more on the susceptibility propagation 

in [H,!!!. 



III. INFERENCE IN THE NON-ERGODIC PHASE 



In an ergodic phase a system visits all configuration space. Sampling for a long time is well represented by the 
measure in 1^. 

In a non-ergodic phase, as happens for the Hopfield model in the memory phase, the steady state of a system is 
determined by the initial conditions. Starting from a configuration close to pattern a, the system spends most of 
its time (depending on the size of system) in that state. We indicate the probability measure which describes such 
a situation by "Pa (a), that is the Gibbs measure restricted to state a. If configurations are sampled from one state, 
then the expression for the log- likelihood in Q should be corrected by replacing F with Fa, the free energy of state 
a. Still the log-likelihood is a concave function of its arguments and so there is a unique solution to this problem. 

It is well known that the Bethe approximation is asymptotically exact in the ergodic phase ((3)- In this case, the 
Gibbs weight can be approximately expressed in terms of one- and two- point marginals Pij{ai, aj), Pi{<Ti) as follows: 

The above equation is exact only asymptotically (on a replica-symmetric system); it can be used for inference in at 
least two ways: The simplest one is by replacing Pij{ai,aj) and Pi{cfi) in the above expression by their experimental 
estimation (given as input of the inference process), equating ([9]) to ([3]) and solving for J and 9_. This is known as the 
"independent pairs" approximation. A second one, often more precise but computationally more involved, is to search 
for a set of J, 9_ and a corresponding J, ^-fixed point of BP equations, such that the Bethe estimation TTij {ui , Cj ) , tt^ {ui ) 
of the two- and one-point marginals match the experimental input as accurately as possible. 

In a non-ergodic phase, it is known however that BP equations typically do not converge or have multiple fixed 
points. This is normally attributed to the fact that the BP hypothesis of decorrelation of cavity marginals fails to be 
true. When a BP fixed point is attained, it is believed to approximate marginals inside a single state (and not the 
full Gibbs probability), as the decorrelation hypothesis are satisfied once statistics are restricted to this state [isj . 

The fact that BP solutions correspond to restriction to subsets of the original measure may suggest that there is 
little hope in exploiting ([9]) on such systems. Fortunately, this is not the case. For every finite system, and every BP 
fixed point a the following holds. 
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FIG. 1. Inference error on a Cayley tree when the leaves are free {q — 0) or fixed {q = 1) to random configuration ^. The data 
come from subspace 0,d{(,) (a sphere of radius d centered at 5). In the inference algorithm we ignore the boundary condition 
and that samples are restricted to QdiO- The internal nodes have degree K — 3 and size of the tree is A'^ = 766. 



A proof of a more general statement will be given in appendix El As in the ergodic case, ([TOl) can be exploited in at 
least two ways: one is by replacing T:°j{ai,aj) and 7rf (ct^) by their experimental estimation inside a state and solving 
for J_,9_ the identity between PU)) and exactly like in the independent pairs approximation, as if one just forgets 
that the samples come from a single ergodic component. A second one is by inducing BP equations to converge on 
fixed points corresponding to appropriate ergodic components. In this paper we will take the latter option. 

Please notice that the second method, as an algorithm, is more flexible with respect to the first one; indeed, there 
is no reason to have a BP fixed point for any set of experimental data, especially when the number of samples is not 
too large. It means that matching exactly the data with those of a BP fixed point is not always possible. Therefore, 
a better strategy would be to find a good BP solution which is close enough to the experimental data. 

Ignoring the information that our samples come from a single ergodic component would result in a large inference 
error due to the maximization of the wrong likelihood. As an example, we take a tree graph with Ising spins interacting 
through random couplings —1 < Jij < +1, in zero external fields 9i — 0. Choose an arbitrary pattern ^ and fix a 
fraction q of the boundary spins to the values in ^. For q = the system would be in paramagnetic phase for any 
finite (3, therefore the average overlap of the internal spins with the pattern would be zero. On the other hand, for 
q = 1 and low temperatures the overlap would be greater than zero, as expected from a localized Gibbs state around 
pattern ^. In this case the observed magnetizations are nonzero and without any information about the boundary 
condition we may attribute these magnetizations to external fields which in turn result to a large inference error in 
the couplings. 

Equivalently, we could put the boundary spins free but restrict the spin configurations to a subspace, for instance 
a sphere of radius d centered at pattern ^ in the configuration space ftd{£,)- That is, the system follows the following 
measure: 

Vdia) cx I{a G ndiQ)e^^^<^ (11) 

where I(g_ G QdiO) indicator function which selects configurations in the subspace fld{£,). By the BP approxi- 

mation we can compute the magnetizations mf^ and the correlations c[^', see Appendix IB] for more details. Taking 
these as experimental data, we may perform the inference by assuming that our data represent the whole config- 
uration space. This again would result to a large inference error (for the same reason mentioned before) whereas 
taking into account that the system is limited to 5^d(C)i ^-ble to infer the right parameters by maximizing the 

correct likelihood; i.e. replacing the total free energy F in the log-likelihood with ^o^(^), the free energy associated 
to subspace ^ld{£,)- 

In figure [T] we display the inference error obtained by ignoring the prior information in the above two cases. Notice 
that in principle the error would be zero if we knew ^d{S.) and the boundary condition. As it is seen in the figure, the 
error remains nonzero when the boundary spins are fixed {q = 1) even if sampling is performed over the whole space. 
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FIG. 2. Inference error versus number of samples for different temperatures. The data are extracted from one pure state of the 
Hopfield model in the memory phase (/? = 2). Size of the system is A'' = 1000, each spin interacts with K — 12 other randomly 
selected spins, and number of stored patterns is P = 3. In the inference algorithm we use rj = 0.02. 



IV. INFERENCE IN THE HOPFIELD MODEL 

The Hopfield model can be found in three different phases. For large temperatures the system is in the paramagnetic 
phase where, in the absence of external fields, magnetizations rrii and overlaps = jj ^i^i ^"^^ zero. If the number 
of patterns is smaller than the critical value Pc, for small temperatures the system enters the memory phase where 
the overlap between the patterns and the configurations belonging to states selected by the initial conditions can be 
nonzero. For P > Pc, the Hopfield model at low temperature enters in a spin glass phase, where the overlaps are 
typically zero. In fully connected graphs Pc — 0.14iV and in random Poissonian graphs Pc ~ 0.637(A:) where (fc) ^ 1 
is the average degree 0, [l^l . 

Take the Hopfield model with zero external fields and in the memory phase. We measure samples from a Glauber 
dynamics which starts from a configuration close to a pattern v. The system will stay for a long time in the state v 
and is thus well described by the restricted Gibbs measure Vvia). In a configuration ct, the local field seen by neuron 
i\shi = Y.jf^di ■^ii'^j = Ti^i '^j£di ^j'^j + ^j€di ^J - corresponds to the retrieved pattern, the first 

term (signal) would have the dominant contribution to hi. The last term (noise) is a contribution of the other patterns 
to the local field. To exploit this information, we look for a set of couplings that result to a Gibbs state equivalent 
to the observed state of the system. One way to do this is by introducing an auxiliary external field pointing to the 
experimental magnetizations, i.e. 6i = Am^^'', for a positive A; we may set the couplings Jij at the beginning to zero 
and compute our estimate of the correlations Cy by the BP algorithm. This can be used to update the couplings by 
a small amount in the direction that maximizes the likelihood, as in ([7]) (we do not update the external fields which 
for simplicity are assumed to be zero). This updating is repeated iteratively, decreasing A by a small amount in each 
step. The procedure ends when A reaches the value zero. 

The auxiliary field is introduced only to induce convergence of the equations towards a fixed point giving statistics 
inside a particular state. Figure [2] compares the inference error obtained with the above procedure for several values 
of temperature in the memory phase. For the parameters in the figure, the inferred couplings from one basin were 
enough to recover the other two patterns. In the figure we also see how the error decreases by taking larger number 
of samples from the system. 

In general we may have samples from different states of a system. Let us assume that in the Hopfield model we 
stored P patterns by the Hebb rule but the samples are from Q basins. The estimated correlations in any state 
/i G {1, . . . , Q} should be as close as possible to the experimental values c^J^'''. 

A natural generalization of the previous algorithm is the following: As before we introduce external fields 0f = 
Am^^^''' for each state fi. At fixed positive A we compute the estimated BP correlations for different states. Each 
of these estimations can be used to update the couplings as in the single state case. Specifically, this amounts 
to make a single additive update to the couplings by the average vector 77 Ac given by Acy = (c^J^ — Qj), where 

^ij^ ~ ^Y^'^=i^'ij^''^ = ^Y^'^^i^ij- Indeed, the addends of Ac will be typically linearly independent, so 
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FIG. 3. Evolution of the inference error with update iterations. The data obtained by sampUng from one or several pure 
states of the Hopfield model in the memory phase (13 — 2). The total number of samples in each case is M — 60000. Size of the 
system is A'^ = 1000, each spin interacts with K — 12 other randomly selected spins, and number of stored patterns is P = 3. 
In the inference algorithm we use r/ = 0.02. 
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FIG. 4. Inference error and number of states which are stable and highly correlated with the patterns after sampling from 
Q pure states of the Hopfield model in the memory phase (/3 = 2). Size of the system is A'' = 1000, each spin interacts with 
K = 24: other randomly selected spins, and number of stored patterns is P = 8. In the inference algorithm we use rj = 0.02 
and number of samples is M = Q x 20000. 



Ac = will imply 



for ^ = 1, . . . , Q. 



We then decrease A and do the BP computation and update steps. Again we have to repeat these steps until the 
external field goes to zero. Figures [3] and S] show how the inference error changes with sampling from different states. 
Notice that if we had an algorithm that returns exact correlations, an infinite number of samples from one state would 
be enough to infer the right interactions in the thermodynamic limit. However, given that we are limited by the 
number of samples, the learning process is more efficient if this number is taken from different states instead of just 
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FIG. 5. Comparing the histogram of Hebbian couphngs with that of learned coupUngs for small and large values of the external 
field. Size of the system is = 1000, each spin interacts with K = 8 other randomly selected spins. Here we are to store P = 8 
random and uncorrelated patterns. In the learning algorithm we use 13 — 2 and rj = 0.003. 



Hebbian learning is a stylized way of representing learning processes. Among the many oversimplifications that 
it involves there is the fact that patterns are assumed to be presented to the networks through very strong biasing 
signals. On the contrary it is of biological interest to consider the opposite limit where only weak signals are allowed 
and retrieval takes place with sizable amount of errors. In the spin language we are thus interested in the case in 
which the system is only slightly biased toward the patterns during the learning phase. In what follows we show that 
one can "invert" the inference process discussed in the previous sections and define a local learning rule which copes 
efficiently with this problem. 

As first step we consider a learning protocol in which the patterns are presented sequentially and in random order 
to the system by applying an external field in direction of the pattern, that is a field 6'f = A^f with A > 0. We assume 
that initially all couplings are zero. Depending on the strength of the field, the system will be forced to explore 
configurations at different overlaps with the presented pattern /i. A small A corresponds to a weak or noisy learning 
whereas for large A the system has to remain very close to the pattern. What is a small or large A, of course depends 
on the temperature and strength of the couplings. Here we assumed the couplings are initially zero, so /3A ~ 1 defines 
the boundary between weak and strong fields. 

The learning algorithm should indeed force the system to follow a behavior that is suggested by the auxiliary field. 
Therefore, it seems reasonable if we try to match the correlations in the two cases: in absence and presence of the field. 
Notice to the similarities and differences with the first part of the study. As before we are to update the couplings 
according to deviations in the correlations. But, here the auxiliary field is necessary for the learning; without that the 
couplings would not be updated anymore. Moreover, it is obvious that we can not match exactly the correlations in 
absence and presence of an external field. We just push the system for a while towards one of the patterns to reach 
a stationary state in which all the patterns are remembered. 

For any A we can compute the correlations c^''^ by either sampling from the Glauber dynamics or by directly running 
BP, with external fields 9^ = A^f . At the same time we can compute correlations c^^- by the BP algorithm in zero 
external fields and with initial messages corresponding to pattern fi. Then we try to find couplings which match the 
correlations in the two cases, namely we update the couplings by a quantity ?/(cfj'' — c^^). The process is repeated for 
all couplings and for — 0(1) iterations with the same pattern /i. Next we switch to some other randomly selected 
pattern v and the whole process is repeated for learning steps. Notice that here A is fixed from the beginning. 

The above learning protocol displays a range of interesting phenomena. Firstly one notices that for c^-^ ~ 
(i.e. for very large external fields) and ~ (i.e. for very high temperature or isolated neurons) the above learning 
results to the Hebb couplings of the Hopfield model. In Figure [5] we compare the histogram of learned couplings for 
small and large A with the Hebbian ones. 

The number of patterns Ps which are highly correlated with stable configurations depends on the strength of 
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FIG. 6. Evolution of the average overlap and fraction of successfully learned patterns in the learning algorithm. Size of the 
system is A'' = 1000, each spin interacts with K = 8 other randomly selected spins. In the learning algorithm we set A — 0.2, 
(3 = 2, Tj = 0.003, and number of patterns that are to store is P = 8. 
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FIG. 7. Average number of successfully learned patterns in the learning algorithm for different values of A. Size of the system 
is A'^ = 1000, each spin interacts with K = 16 other randomly selected spins. The learning algorithm works at /3 = 2, = 0.003, 
and number of patterns that are to store is P = 16. The average is taken over 10 realizations of the patterns. 



external fields. We consider that pattern /i is "learned" if there is a Gibbs state with nonzero overlap O'' that is 
definitely larger than the other ones {0'^\i' ^ /i}. Figures |6] and [7] show how these quantities evolve during the learning 
process and by increasing the magnitude of external filed. 

For small P nearly all patterns are learned, whereas, for larger P some patterns are missing. A large number 
of patters can thus be learned at the price of smaller overlaps and weaker states. That is, the average overlap in 
successfully learned patterns decreases continuously by increasing P, approaching the paramagnetic limit. In Figure 
[5] we compare this behavior with that of Hebb coupHngs. 

As the figure shows, there is a main qualitative difference between Hebbian learning of the Hofield model and 
the protocol discussed here. In the former case when the number of stored patterns exceeds some critical value the 
systems enters in a spin glass phase where all memories are lost and the BP algorithm does not converge anymore. 
On the contrary, in our case many patterns can be stored without ever entering the spin glass phase (for a wide range 
of choices of A). The BP algorithm always converges, possibly to a wrong fixed point if the corresponding pattern is 
not stored. 
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FIG. 8. Average number of successfully learned patterns in the learning algorithm and Hebb rule versus P, the number of 
patterns that are to store. The inset shows the average overlap. Size of the system is A'^ = 1000, each spin interacts with 
K = 16 other randomly selected spins. The learning algorithm works at A = 0.2, /3 = 2, 77 = 0.003. The average is taken over 
10 realizations of the patterns. 



A. Population dynamics analysis of the learning protocol 

Population dynamics is usually used to obtain the asymptotic and average behavior of quantities that obey a set 
of deterministic or stochastic equations [25j . For instance, to obtain the phase diagram of the Hopfield model with 
population dynamics one introduces a population of Np messages representing the BP cavity messages in a reference 
state, e.g. pattern ^'^ = +1 Then one updates the messages in the population according to the BP equations: 
at each time step a randomly selected cavity message is replaced with a new one computed hy K ~ 1 randomly 
selected ones appearing on the r.h.s. of the BP equations. In each update, one generates the random couplings 
= ^ + ^ ii^j by sampling the other P — 1 random patterns. After a sufficiently large number of updates, 

one can compute the average overlap with the condensed pattern v to check if the system is in a memory phase. 
The stability of condensed state would depend on the stability of the above dynamics with respect to small noises in 
the cavity messages. If Np is large enough, one obtains the phase diagram of Hopfield model in the thermodynamic 
limit averaged over the ensemble of random regular graphs and patterns. We used the above population dynamics to 
obtain the phase diagram of the Hopfield model on random regular graphs, see Figure O 

In order to study the new learning protocol we need a more sophisticated population dynamics. The reason is that 
in contrast to Hebb couplings, we do not know in advance the learned couplings. In Appendix [C] we explain in more 
details the population dynamics that we use to analyze the learning process studied in this paper. The algorithm is 
based on P populations of BP messages and one population of couplings. These populations represent the probability 
distributions of BP messages in different states and couplings over the interaction graph. For a fixed set of patterns 
{^'^1^ = 1, . . . , P} we update the populations according to the BP equations and the learning rule, to reach a steady 
state. Figure [TU] displays the histogram of couplings obtained in this way. In the figure we compare two cases of 
bounded and unbounded couplings. In the first case the couplings should have a magnitude less than or equal to 1 
whereas in the second case they are free to take larger values. We observe a clear difference between the two cases; 
when A is small, the couplings are nearly clipped in the bounded case whereas the unbounded couplings go beyond 
±1. However, in both cases there is some structure in the range of small couplings. Increasing the magnitude of A we 
get more and more structured couplings. For very large fields they are similar to the Hebb couplings. For small fields 
the histogram of the couplings is very different from the Hebb one, though the sign of the learned and the Hcbbian 
couplings is the same. 

There are a few comments to mention here; in the population dynamics we do not have a fixed graph structure and 
to distinguish P patterns from each other we have to fix them at the beginning of the algorithm. Moreover, we have 
to modify the BP equations to ensure that populations are representing the given patterns, see Appendix [Cj And 
finally the outcome would be an average over the ensemble of random regular graphs, for a fixed set of patterns. 

Having the stationary population of couplings, one can check the stability of each state by checking the stability of 
the BP equations at the corresponding fixed point. The maximum capacity that we obtain in this way for the learned 
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FIG. 9. The phase diagram of Hopfield model on random regular graphs of degree K = 12 obtained with population dynamics 
{Np — IC). Horizontal axes is number of patterns P and vertical axes is temperature T = 1//3. The paramagnetic, memory 
and spin glass phases are labeled with P, M and SG, respectively. 



couplings is the same as the Hebb one whereas on single instances we could store much larger number of patterns. The 
reason why we do not observe this phenomenon in the population dynamics resides in the way that we are checking 
stability; the fixed patterns should be stable in the ensemble of random regular graphs. In other words, checking for 
stability in the population dynamics is stronger than checking it in a specific graph. 

The main result of our analysis consists in showing that the distribution of the couplings arising from the BP 
learning protocol is definitely different from the Hebbian one. 



VI. DISCUSSION AND PERSPECTIVES 



We studied the finite connectivity inverse Hopfield problem at low temperature, where the data are sampled from 
a non-ergodic regime. We showed that the information contained in the fiuctuations within single pure states can be 
used to infer the correct interactions. 

We also used these findings to design a simple learning protocol which is able to store patterns learned under noisy 
conditions. Surprisingly enough it was possible to show that by demanding a small though finite overlap with the 
patterns it is possible to store a large number of patterns without ever reaching a spin glass phase. The learning 
process avoids the spin glass phase by decreasing the overlaps, as the number of patterns increases. A separate analysis 
which is similar to the one presented in Ref. |27| (and not reported here) shows that the equations can be heavily 
simplified without loosing their main learning capabilities. 

In this paper we focused on a simple model of neural networks with symmetric couplings. It would be interesting to 
study more realistic models like the integrate and fire model of neurons with general asymmetric couplings. Moreover, 
instead of random unbiased patterns one may consider sparse patterns which are more relevant in the realm of neural 
networks. 

The arguments presented in this paper can also be relevant to problem of inferring a dynamical model for a system 
by observing its dynamics. In this case, a system is defined solely by its evolution equations and one cannot rely 
on the Boltzmann equilibrium distribution. Still it is possible to try to infer the model by writing the likelihood for 
the model parameters given the data and given the underlying dynamical stochastic process. A mean-field approach 
has been recently described in [26l | . We actually checked this approach in our problem and observed qualitatively the 
same behavior as the static approach. In fact, which method is best heavily depends on the type of data which are 
available. 
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FIG. 10. The histogram of learned couphngs obtained by the population dynamics in random regular graphs of degree K = 8. 
Number of patterns that are to store is P = 8. In the upper panel we compare the two cases of learning with bounded and 
unbounded couplings for a small external field. In the lower panel we compare the Hebb rule with the learning algorithm for a 
large external field. In the algorithm we use A'j, — 1000, P — 1, and rj — 0.01. 
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Appendix A: Proof of Exactness of the Bethe expression for arbitrary BP fixed points 

AB — >• c» limit version of this result (except the determination of the value of the constant ZcibbslZBethe) appeared 
in [2g|. This result is valid for general (non-zero) interactions. For a family of "potentials" '^a '■ ^a(^a) > 

where we denote by the subvector of x given by {xi : i E da}. We will use the shorthand i & a or a & i to mean 
i G da. 

Proposition. Given a factorizcd probability function 



'Pix)=- — n*-fea) 

^Gihbs 
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and a BP fixed point {bia}i^a a^A ^^'^ plaquette marginals ba (xa) = ^"^a (aio) Yiiea i^i) ^-nd single marginals 
h (x,) = E{a;,:jea\-*} fea) for every a e i, then 

Proo/. Using the fact that bi{xi) = z"^ Jlaei (a;^) and bia{xi) = IleeAQ ^'^^ '^'^ obtain biaix.^) = 
bi{xi)zib~^ {xi)z~^ , then using the definitions: 

V n»ea i^^) i V n^ea ^m^, 'b,a (X,) ba^ (x,) \^ 

_ TT (gg) TT Uaei ^<^i (^') 

This proves that a fixed point can be interpreted as a form of reparametrization of the original potentials. In fact, 
a sort of converse also holds: 

Proposition: If 7^(x) oc Ha ^afea) satisfies a Bethe-type expression 

with J2{x -jeaXi} (Xa) = i^i) fo^ cvcry i e a. Then there exists a BP fixed point {foialiga aeA such that 6a (Sq) 

Proof: Choose any configuration z. We will use the following notation: z^^j = {zj}j^a\i, and z_j = {zj}j^i. Define 
bia (xi) oc 4*0^^ (^a;i, z^^j^ 6a (^2:^, z^^^^ , normalized appropriately. Afterwards, we can define bai (xi) cx bi (xi) b'^^ {xi). 

By definition of V we have V {xi\z__^ oc Haei Similarly, but using (jAl[) . and noting by rii — \di\, 

we have also V {x,\z^^) oc 6, (a;,)^""' Hae^ (a;«, ^av) • Then 6, (xi)"""^ oc Dae^ ^1(1'.'^?'.) °^ Tlaet^^a{xi) oc 

riaei^i (^Oriasi^m^ (^i)' ^^"^ t^'^^ bi(xi) OC naei^ii(^*)- ^his also implies that bia{xi) oc 6i(a;i)6;^/(xi) oc 
OeeAo ^ei(a^i), proving that the first BP equation is satisfied. 

By definition of T', V (x„|z_a) « ^'a (£a)nieaCi (a;^) where a {xi) = JleeAa (^^j'^eVi)- Moreover using ([AH, we 
can conclude that also V {x^\z_^ oc ba {x^) Hiea ^« (^^0 where (xi) = 6i (a;i)^""' HeeAa ^'^ (^*'^e\i) • This implies 



that 



^^^^ n^ea Sl^- ^^1^° ^^^"^ flfl (^*)"' ^ HeeAa ^^e^ °^ neG»\a ^« OC 6ia so 



^afea) 0^ *a fea)niea^ia (^0 dcsircd. Now h(xi) = Z^a;^^^ ^afea) by hypothcsis, SO bai(xi) OC 64(0:^)6-^^ (a;^) oc 
Sa; ^. *a (Sa) IljeaXi ^ja (xj) and this provcs that the second BP equation is also satisfied. 

Appendix B: Computing thermodynamic quantities in a restricted space 

Consider the Ising model on a tree graph of size N with couplings J and external fields 6. Suppose that we are 
given a reference point ^ in the configuration space {-1,+1}^ and the following measure 

Tdia) oc I{a e f^d(e))e^- /3e.-.+E.<, PJ..^.-. ^ (Bl) 

where ^d{0 is a sphere of radius d centered at ^. By distance of two configurations we mean the Hamming distance, 
i.e. number of spins which are different in the two configurations. The aim is to compute thermodynamic quantities 
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like average magnetizations and correlations in an efficient way. We do this by means of the Bethe approximation 
and so BP algorithm. 

First we express the global constraint I{a_ £ ild(O) ^ of local constraints by introducing messages di^j that 
each node sends for its neighbors. For a given configuration a, 

d,^j= J2 ^fe^« + (l-'5-,.cJ, (B2) 

kedi\j 

denotes the distance of a from ^ in the cavity graph Gi^j which includes i and all nodes connected to j through i. 
With these new variables we can write BP equations as 

{<7kMk^i\keai\j} ked'i\j 

where li is an indicator function to check the constrains on d^-yfe and X^feeOi ^fc^i + (1 ~ '^o-i,{i) £ d. Starting from 
random initial values for the BP messages we update them according to the above equation. After convergence the 
local marginals read 

7T{ai,aj) (X e^'^'^'^"''^^ ^ Iij7Ti^j{ai,d^^j;aj,dj^i)7Tj^i{aj,dj^i;a.i,d.i^j), (B4) 

where in lij we check if di^j + dj^i < d. These marginals will be used to compute the average magnetizations and 
correlations. Notice that when the graph is not a tree we need to pass the messages di^j only along the edges of a 
spanning tree (or chain) which is selected and fixed at the beginning of the algorithm. 

Appendix C: Population dynamics 

Consider P patterns € {— 1,+1}, where /i = 1, . . . , P and a goes from 1 to iVp, which is equivalent to the size of 
system. The patterns, learning rate rj and parameter A are fixed at the beginning of the algorithm. To each patten 
we assign a population of messages Tr^jCo") where I = 1 . . . ,K {K is the node degree). These are to represent the 
normalized BP messages that we use in the learning algorithm. Besides this we have also a population of couplings 

Jab- 

The population dynamics has two update steps: updating the P populations of messages and updating the popu- 
lation of couplings. 

To update the messages in population /i we do the following: 

i) select randomly (ao, Iq) and {(ai, li), . . . , {uk-i, Ik-i)}, 

ii) use messages {tt^^ ■ . ■ ,T^a^_^j^_^} and couplings {Jao,ai, ■ • • , Jao,aK-i} to compute a new BP message 7r„eto, 

iii) replace message 7r^„_,„(C^J with max(7r„e„(^^J, 7r„e„(-^^J), 

Notice to the maximum we are taking in the last step. This is to ensure that BP messages in population fi are 
related to pattern We do these updates for tsp iterations, where in each iteration all members of a population 
are updated in a random sequential way. 

To update the couplings we go through the P populations and do the following: 

i) select randomly (a, la) and {b,lb), 

ii) use messages tt^; , tt^; and coupling Jab to compute correlation c^^'^, i.e. in presence of external fields X^^ and 

iii) use messages tt^^ ^''^bi^ and coupling Jab to compute correlation c^^, i.e. in absence of the external fields, 

iv) update the coupling as Jab = Jab + vi^ab' ~ ^ab) 
The learning updates are done for iterations. 

All together the population dynamics will have Tl learning steps each one consist of Ptsp + PtL update iterations. 
In practice we set Ibp — 10 and II — ^■ 
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